Skip to content

Conversation

@nirandaperera
Copy link
Contributor

No description provided.

Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 13, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@nirandaperera nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change DO NOT MERGE Hold off on merging; see PR for details and removed DO NOT MERGE Hold off on merging; see PR for details labels Aug 13, 2025
@nirandaperera
Copy link
Contributor Author

/ok to test

Comment on lines +1358 to +1360
auto [msg_available, info] = shared_resources_->get_worker()->tagProbe(
::ucxx::Tag(static_cast<int>(tag)), UserTagMask
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would use the new implementation from rapidsai/ucxx#458, thus we won't need to go through the receive queue twice, but we need someone to review that PR.

EXPECT_EQ(cid, chunk_id);

auto data_buf = chunk.release_data_buffer();
data_buf->wait_for_ready();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good idea, callbacks should be lightweight because they may execute during the UCX worker progress loop, so in the case here you're blocking UCX entirely until wait_for_ready() returns.

));
} else if (comm->rank() == 1) {
auto recv_any_cb = [&](std::unique_ptr<Buffer> buf, Rank sender_rank) {
auto const& recv_buf = br->move_to_host_vector(std::move(buf));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if this actually moves and waits for data to be ready, if it does it causes the same problem as mentioned above as it blocks UCX progress.

Comment on lines +283 to +284
auto data_buf = br->allocate(chunk.concat_data_size(), stream, reservation);
data_buf->wait_for_ready();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, waiting for an allocation will block the UCX progress.

@pentschev pentschev changed the base branch from branch-25.10 to branch-25.12 September 25, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants